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1. INTRODUCTION 

As the coronavirus (COVID-19) pandemic spread across the globe, it is causing a significant degree 
of fear and concern in the public. In terms of public mental health, elevated depression rates are the most 
significant psychological effect to date. Younger adults had higher mental health rates, while adults enduring 
serious health issues had more mental health problems [1]. The analysis showed that mental health problems 
decreased by 5% with every year’s rise in age [2]. Children from lower socio-economic classes who were 
exposed to experiences of mental health problems early in their lives, be it due to both or either parent, were 
more likely to become mentally ill later in life. Mood disorders and suicide-related findings have soared over 
the past decade [3], [4]. According to the Institute for Public Health, mental health disorders among adults 
have increasingly become worrying from 10.7% in 1996 to 29.2% in 2015 [5]. 

Depression, the most common type of mental illness, is a psychological condition that happens to 
anyone at various ages due to specific reasons such as loss of self-esteem and social environment. The 
symptoms faced by depressed individuals may have a severe effect on their capability to deal with any 
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condition in everyday life, which significantly varies from the usual mood variations. Depression affects not 
only physical but also psychological well-being [6]. It is associated with diabetes, hypertension, and back 
pain [7]. Besides that, a mental disease is often a burden in the form of tension, marriage breakdown, or 
homelessness for families, friends, caregivers, and other relationships [8]. Therefore, an initiative and 
commitment to prevention and treatment for depression are necessary. 

Depression is one of the leading mental illnesses that is least diagnosed, considering the incidence 
and seriousness. The diagnosis and evaluation of signs of depression rely almost exclusively on data provided 
by patients, family members, friends, or caregivers [9]. This type of article, however, is inaccurate because it 
relies on the reporter’s total integrity. Depression-related self-perceived shame is widespread in societies 
worldwide and is associated with unwillingness to seek professional assistance [10]. Patients are also hesitant 
to express their depressive feelings with physicians, so a discussion of depression often relies heavily on a 
general practitioner’s willingness to engage with the patient. The prevalence of depression in Malaysia is 
considerably higher than in the United States and most other Western countries [11]. Depression is a severe 
mental illness and a significant public health issue that has a massive effect on society. In the worst case, 
depression can lead to suicide. Even though it is a severe psychological issue, fewer than half of people with 
this emotional problem have received mental health services [6]. It may be attributed to various reasons, 
including lack of knowledge of the disease. Additionally, researchers discovered that embarrassment and 
self-stigmatization tend to pose as more significant factors for not obtaining medical attention than others’ 
actual prejudice and adverse reactions [12]. 

The capability to predict depression using machine learning algorithms before conditions worsen is 
essential. Therefore, in this paper, we conducted a systematic review of literature from 2016 to 2021 (time of 
writing) to help researchers better understand this area. This review aims to firstly, identify variables relevant 
to the prediction of depression using machine learning techniques, secondly, identify the latest and most 
frequent screening types used in detecting depression and finally, popular state-of-the-art techniques in 
machine learning to predict depression based on chosen metrics and values of performance. 

Using machine learning techniques for the prediction of medical conditions is not new. Recent 
publications show applications in hepatitis [13], autism [14] and cancer [15]. Nevertheless, it is not without 
weaknesses. The primary weakness of any prediction pipeline involving machine learning techniques is the 
substantial dependence on correctly annotated data. If a dataset size is small, manually annotating each data 
point is feasible, however, in this big data era manual annotation of data has become impractical. Since 
machine learning techniques are trained on these annotations, a dataset with low-quality labels can result in 
unreliable predictions. Another weakness is the risk of overfitting. In the pursuit of achieving higher 
prediction performance, these techniques can develop a tendency to induce a model fitted to specific unique 
data points which do not represent a large portion of the population. Thus, rendering the models useless. 

Our contribution via this study is a systematic review covering key aspects in predicting depression. 
Significant variables in previous works are identified, depression screening tools used are investigated and 
popular machine learning algorithms based on classical as well as new measurements of performance are 
highlighted. 

The paper is outlined as follows: in section 2, the systematic literature review methodology is 
explained. Our proposed methodology and research questions are detailed in section 3. Then, the results of 
our review are presented in section 4. Finally, in section 5 we conclude this paper. 


2. SYSTEMATIC MAPPING STUDY (SMS) METHOD 

Systematic mapping study (SMS) method organizes published research and their results into 
structured categories by systematically perusing its primary contents, methodology and results with the aim 
of mitigating bias and concluding using statistical meta-analysis supported by evidence [16]. Although 
originally introduced for medical research, SMS method has been adapted for computing. Figure 1 shows the 
primary three phases of the SMS method used in our study. Each phase produces an outcome which in turn 
triggers the next phase. 

The SMS method begins with the formulation of research questions so that the coverage of existing 
literature can be framed. Once the scope of the review has been determined, a search of the literature is 
conducted involving the definition of information sources from various academic online databases, digital 
libraries, and search engines. Exploration of these sources is performed using search terms constructed to 
encompass the earlier formulated research questions using Boolean operators. From all the papers extracted, 
screening based on keywords, abstract, introduction and conclusion sections are carried out to identify only 
relevant papers that can provide answers to the previous questions. 
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3. RESEARCH METHODOLOGY 

In this section, we describe the steps on how we applied the SMS method to systematically review 
existing literature from 2016 till 2020 (time of writing). In each following subsection, we describe in detail 
the input, activity and output involved in each step. Finally, we illustrate the summarized evolution of paper 
filtration process to obtain the final relevant papers for review. These steps are defined research questions, 
literature search and screening papers. 


Phases 


Define research 
questions 


Literature search Screen papers 


| 


v 


Literature review All papers 


Relevant papers 


scope 


Outcomes 


Figure 1. Phases of the SMS method 


3.1. Define research questions 

At this phase, research questions were formed to seek literature within the scope of predicting 
depression using machine learning methods. The first question is concerned with what variables were used by 
recent proposals for the prediction process. This answer allows researchers to identify relevant variables. A 
good selection of variables helps to produce good prediction performance. The second question is which 
depression screening tools were adopted. This question provides an understanding of a particular screening 
tool that has been continuously used by researchers and how many of the proposals are not utilizing any 
screening tools. From the answer to this question, researchers can decide the necessity of adopting specific 
screening tools into their work. The final question is what machine learning techniques were proposed by 
existing research? This question helps direct researchers to state-of-the-art machine learning techniques 
applied to depression prediction. Table | lists the constructed research questions and the motivations behind 
them. 


Table 1. Research questions and motivation 


Research questions Motivation 
RQI: What variables were used by recent The answer to this question allows researchers to identify variables relevant to the 
proposals in predicting depression? prediction of depression. 
RQ2: Which depression screening tools The answer to this question identifies the latest and most frequent screening types used 
were adopted? in detecting depression. 
RQ3: What machine learning techniques The answer to this question provides researchers with popular state-of-the-art 
were proposed by existing research? techniques in machine learning to predict depression based on chosen metrics and 


values of performance. 


3.2. Literature search 

A through search was conducted on four prominent electronic databases utilizing the following 
keywords: “depression prediction”, “mental health prediction”, and “anxiety, depression, and _ stress 
prediction”. The keywords were combined using Boolean AND expression and OR expression. The 
databases searched were: JEEE Xplore  (http://ieeexplore.ieee.org), ACM Digital Library 
(http://www.portal.acm.org/dl.cfm), Elsevier ScienceDirect (http://www.sciencedirect.com), and Google 
Scholar (http://scholar.google.com). 


3.3. Screening papers 

The papers were examined based on their relevance to our constructed research questions. We 
analyzed the title, abstracts, and keywords to ascertain they lie within our focus of interest. Then, the papers 
were classified into two categories based on the following inclusion (/) and exclusion (£) criteria: 
I]: Paper should directly relate to depression prediction using machine learning techniques. 
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12: Papers should provide answers to the research questions. 

I3: Papers should contain at least one of the search keywords. 

E]: Posters, panels, abstracts, presentations, and article summaries. 
E2: Duplicates. 

E3: Papers without full text. 

The initial collection of papers from all electronic databases yielded 73 papers. Since there exists an 
overlap due to the search on Google Scholar, duplicates were removed with a remaining of 50 papers. Next, 
32 irrelevant papers were excluded after the title and abstract of each paper were perused. The resulting 
18 papers were then fully read through and resulted in 3 found irrelevant whereas the rest of the 15 papers 
were included in this review. Figure 2 shows the screening process. 


Collect papers from all 
electronic databases (72 = 73) 


Remove duplicates (7 = 50) 


Include papers based on title 
and abstract (7 = 18) 


Exclude irrelevant papers 


(n= 32) 


Include papers based on full 
title (7 = 15) 


Exclude irrelevant papers 
(n= 3) 


Include papers into the review 
(n= 15) 


Figure 2. Paper screening process 


4. RESULTS AND DISCUSSION 

The 15 relevant papers included in this review is listed in Table 2 by year, source, the scope of 
prediction and number of citations. The list suggests that studies on depression prediction were actively 
conducted in 2020 (31%) and 2016 (25%). The former is most likely due to the COVID-19 pandemic 
whereas for the latter no prominent event could be linked. In relation to the number of citations, sources 
based on computing and technology received a large number of citations since they lead to the introduction 
of new techniques, whereas medical-centered sources are lesser cited, owing to their more general application 
of these new techniques. IEEE, a widely known online database, recorded the highest number of cited 
sources (ICHI and KDE). 


4.1. RQ1: what variables were used by recent proposals in predicting depression? 

To predict depression, the researchers use several types of datasets. Some of them predict depression 
using demographics and clinical attributes, some use social media to collect information by using text 
analytics, hence, benefits from textual features instead of attributes. The various common variables in 
depression prediction found in 6 of the relevant papers are presented in this section. Table 3 shows the 
demographics and clinical variables that had been used in past research. Based on previous studies, it is 
found that the most used variable is age and marital status, followed by gender, educational status, and 
socio-economic status. For clinical variables, diabetes has been used twice in previous studies, while others 
have only been used once and most of them in P3. 
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Table 2. List of relevant literatures 
Paper Year Reference Source Scope of prediction Number of 
ID citations 
Pl 2016 [17] Biomedical Signal Processing and Control Depression 18 
P2 2016 [18] International Journal of Computer Applications Depression 21 
P3 2017 [19] Healthcare Technology Letters Anxiety and depression 34 
P4 2017 [20] Proceedings-2017 IEEE International Conference on Depression 7 
Healthcare Informatics, ICHI 2017 
PS 2017 [21] Proceedings of the Twenty-Sixth International Joint Depression 109 
Conference on Artificial Intelligence 
P6 2018 [22] CEUR Workshop Proceedings Depression and 16 
anorexia 
P7 2019 [23] Informatics in Medicine Unlocked Anxiety and depression 32 
P8 2019 [24] Journal of Medical Internet research Depression 41 
P9 2019 [25] International Conference on Human Centered Depression NA 
Computing 
P10 =. 2019 [26] International Conference on Advances in Engineering Anxiety, depression, 16 
Science Management and Technology (ICAESMT)- and stress 
2019 
Pll 2020 [27] IEEE Transactions on Knowledge and Data Depression 85 
Engineering 
P12 2020 [28] Procedia Computer Science Depression 20 
P13 2020 [29] Doctoral dissertation, Ecole de technologie Depression 1 
supérieure-Superior Technology School 
P14 2020 [30] Healthcare Depression 1 
P15 2020 [31] Journal of Affective Disorders Anxiety, depression, 2 
and stress 


Table 3. Variables used by recent proposals 


Variable P2 P3 P4 P7 


P14 


Age vev vev 
Gender vev 
. Residence status Vv 
. Educational status 
Vv 


Marital status 
Income 


KSA 


Employment status 
. Socio-economic status 


CHOADARHWN 


. Smoking Status 

. Drinking 

. Diabetes 

. Hearing problem 

. Visual impairment 

. Mobility impairment 


RORES 
KAA K 
KaSK 


a 
Nn 


. Insomnia 
. Stroke Vv 


— 
lon 


4.2. RQ2: which depression screening tools were adopted? 
Our review discovered 5 screening tools popularly used by past studies in depression prediction: 
geriatric depression scale (GDS), hospital anxiety and depression scale (HADS), patient health questionnaire 
(PHQ), hamilton depression rating scale (HDRS) and depression anxiety stress scale 21 (DASS-21). Refer to 
Table 4. We discovered that proposals predicting depression utilize screening tools when their methodology 
require the self-construction of a dataset. The motivation driving this construction is mainly because of the 
absence of an available dataset necessary to accomplish a research’s unique objective of filling up a specific 
gap in the knowledge. For example, the use of GDS is targeted at screening depression in elders. These tools 
allow patients to assess themselves and ratings are based on this assessment. These self-assessment tools are 
not meant to replace a psychiatrist’s diagnosis but instead function as a signpost to the presence of symptoms 
or to reinforce an earlier diagnosis that a psychiatrist may be considering. Our result shows that both HADS 
and HDRS were adopted by more research as compared to PHQ and DASS-21 in relation to the general 


population. GDS, however, was adopted when the older population is the subject of interest. 
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Table 4. Screening tools adopted 
Paper ID Screening tools 


Pl GDS 
P2 GDS 
P3 HADS 
P4 None 
PS None 
P6 None 
P7 HADS 
P8& None 
P9 PHQ 
P10 None 
Pll None 
P12 DASS-21 
P13 HDRS 
P14 None 
P15 HDRS 


4.2.1. Geriatric depression scale (GDS) 

GDS [32], [33] consists of 30 questions targeted at the older population of 65 years and more who 
are medically ill. Although other depression screening tools are available, GDS has become the popular tool 
for this category of people. GDS simply requires a yes or no answer of how an elder feels in the past week. 
Because of its high sensitivity of 92% and specificity of 89%, GDS is viewed to be a valid and reliable tool. 
Table 5 shows the severity ratings produced by GDS. 


Table 5. GDS severity ratings 


Severity | Depression 


Normal 0-4 

Mild 5-8 
Moderate 9-11 
Severe 12-15 


4.2.2. Hospital anxiety and depression scale (HADS) 

HADS [34], [35] measures the severity of not only depression but also anxiety. Since its 
introduction in 1983, HADS has become a popular screening tool for these two mental conditions. 
Comprising of 7 questions for anxiety and 7 questions for depression, HADS can be easily completed within 
a few minutes. The validity of HADS has been proven and is now on the recommendation list of the National 
Institute for Health and Care Excellence (NICE) to diagnose depression and anxiety. Table 6 displays HADS’ 
severity ratings. 


Table 6. HADS severity ratings 


Severity Depression 


Mild 8-10 
Moderate 11-14 
Severe 15-21 


4.2.3. PHQ 

The PHQ [36], [37] is a multipurpose method for screening, tracking, diagnosing, and measuring 
depression severity. It is a self-administered instrument with two distinct types, the PHQ-2 containing two 
items and the PHQ-9 containing nine items. PHQ-2 assesses the frequency of depressive episodes and 
anhedonia for the last two weeks, while PHQ-9 presents a clinical diagnosis of depression and measures the 
severity of symptoms. Table 7 shows the PHQ severity rating. 


Table 7. PHQ severity ratings 


Severity Depression 
Mild 0-5 
Moderate 6-10 
Moderately severe 11-15 
Severe 16-20 
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4.2.4. DASS-21 

DASS-21 is a compilation of three scales of self-report by a patient that determines the patient’s 
depression, anxiety, and emotional stress states. The underlying notion is these states tend to be correlated 
where anxiety and depression were discovered to be comorbid illnesses [38] and depression is a stress-related 
mental disorder [39]. Each state is measured by answering 7 questions relating to how a patient feels over the 
past week. DASS was designed to calculate the level of negative emotions to assist both researchers and 
clinicians to observe a patient’s condition over time with the aim of determining the course of treatment. 
Table 8 shows the DASS-21 severity ratings. 


Table 8. DASS-21 severity ratings 


Severity Depression Anxiety Stress 
Normal 0-9 0-7 0-14 
Mild 10-13 8-9 15-18 
Moderate 14-20 10-14 19-25 
Severe 21-27 15-19 26-33 
Extremely Severe 28+ 20+ 34+ 


4.2.5. HDRS 

HDRS [40], [41] is specialized in assessing the severity of depression and has also been proven useful 
before, during, and after therapy to assess a patient’s level of depression. It is widely perceived as an effective 
treatment for hospitalized patients. 21 items are listed in the HDRS form. The scoring basis is on the first 17 
items, with 18 to 21 items used to qualify depression further. Table 9 shows the HDRS severity rating. 


Table 9. HDRS severity ratings 


Severity Depression 
Normal 0-7 
Mild 8-13 
Moderate 14-18 
Severe 19-22 
Very severe 23+ 


4.3. RQ3: What machine learning techniques were proposed by existing research? 
Table 10 shows a list of the proposed machine learning techniques that were used in past research. 
For papers that compare the performance of the techniques, the highest scored technique is also listed in the 
table. Figure 3 summarizes in a tree map the number of papers using the proposed technique. Most papers 
experimented on random forest (RF), support vector machine (SVM), random tree (RT), naive Bayes (NB), 
logistic regression (LR) and decision tree (DT). While this indicates the popularity of a specific machine 
learning technique among researchers, it is more importantly to know which of these techniques consistently 
scores the best performance when applied over different datasets. Out of the 15 papers reviewed, 12 papers 
conducted a comparison of performance. Therefore, from Figure 4, the graph shows RF returning the best 
performance in 4 instances of the comparison. RF prevails across different performance metrics in terms of 
achieving the best performance against other machine learning techniques. This is not only true for classical 
performance metrics e.g., accuracy, precision, and recall, but also newer forms of performance metrics such 
as early risk detection error (ERDE). It is noteworthy of publications proposing newer machine learning 
techniques i.e., Sons & Spouses algorithm (SS) superseding RF on traditional measurements of performances 
specifically accuracy, f-measure, precision, recall and area under the receiver curve. A particularly new 
performance metric is ERDE formulated specifically for detecting mental illness early. 
Nomenclature: 

- ADA: AdaBoost 

- BA: Bagging 

- BN: BayesNet 

- CNN: convolutional neural networks 

- DT: decision Tree 

- GB: gradient Boosting 

- KNN: K-nearest neighbor 

- LR: logistic regression 

- MDL: multimodal depressive dictionary learning 
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MLP: multi-layer perceptron 
MSNL: multiple social networking learning 


NB: naive Bayes 
NN: neural network 
RF: random forest 
RT: random tree 


RSS: random subspace 


SMO: sequential minimal optimization 


SS: Sons & Spouses 


SVM: support vector machine 
WDL: wasserstein dictionary learning 


Table 10. Proposed machine learning techniques 


Oo 1115 


Paper Machine learning Best technique Performance metrics Best 
ID techniques used performance 
Pl RF, RT, MLP, and RF Accuracy 95.45 

SVM Mean absolute error 0.12 
Root mean squared error 0.22 
Relative absolute error 24.30 
Root relative squared error 44.79 
Po BN, LR, MLP, SMO, BN Accuracy 91.67 
and decision table Precision 0.92 
ROC area 0.98 
Root mean squared error 0.25 
P3 BN, LR, MLP, NB, RF Accuracy 89 
RF, RT, DT, random True positive rate 89 
optimization, False positive rate 10.9 
sequential, random Precision/positive prediction value 89.1 
sub-space, and K star F-measures 89 
Area under the receiver curve 94.3 
P4 Stacking of LR DT, LR (base-level learner) with DT, Mean area under the receiver curve 75 
NBN, NN, SVM NBN, NN, SVM (meta-level learner) Mean accuracy 86 
P5 NB, MSNL, WDL MDL Precision 84 
and MDL Recall 85 
Fl-measure 84 
Accuracy 84 
P6 CNN with TF-IDF Not compared ERDE; 10.81 
information ERDEso 9.22 
F-score 37 
P7 CatBoost, LR, NB, CatBoost Accuracy 89 
RF, and SVM Precision 84 
P8& DT, RT, and RF RF ERDE; 18.51 
ERDEs9 15.20 
F-measure 20 
Precision 12 
Recall 0 
P9 BN, SVM, SMO, RT, BN Accuracy 77.8 
and DT 
P10 NB, RF, GB, and Ensemble Vote Classifier Accuracy 85 
Ensemble Vote F-score 76.9 
Classifier 
Pll CNN Not compared ERDE+zo 9.46 
ERDE29 TAT 
Fratency 0.45 
P12 NB, RF, DT, SVM, RF Accuracy 79.8 
and KNN Error rate 0.20 
Precision 88.1 
Recall 67.8 
Specificity 91.0 
FI score 76.6 
P13 SVM, RT, and RF RT Accuracy 91.3 
Recall 91.2 
Precision 91.3 
P14 SS, TAN, LR, DT, SS F measure 91.8 
NN, SVM, ADA, Accuracy 93.0 
BA, RF, RSS Area under the receiver curve 76.9 
Precision 93.1 
Recall 90.6 


Depression prediction using machine learning: a review (Hanis Diyana Abdul Rahimapandi) 


11146 O ISSN: 2252-8938 


Figure 3. The number of papers using the proposed technique 
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Figure 4. Techniques with consistently high performance 


5. CONCLUSION 

In this timely paper, we have reviewed depression prediction literature from 2016 to 2020 that used 
machine learning techniques. We employed the SMS method, and the result is a total of 15 works were found 
relevant to the research questions constructed. The research questions focus on three important aspects of 
predicting depression using machine learning; they are the variables used in the literature to predict, the 
screening tools adopted, the machine learning techniques experimented, the metrics employed to measure 
each techniques’ performance and the highest values achieved by the top-performing techniques. Our review 
has led us to conclude that information on age, marital status, gender, educational status, and socio-economic 
status are repeatedly used across the proposals. In addition, most of the works which made use of depression 
screening tools relied on self-reporting types. Furthermore, Random Forest was not only the most popular 
machine learning algorithm among researchers but also returns the best performance in a majority of the time 
inclusive of newer performance metrics e.g., ERDE. It is expected that this survey will enlighten researchers 
on the latest machine learning techniques, performance measurements and variables used in predicting 
depression. 
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